Picture for Zhijing Jin

Zhijing Jin

Fluid Representations in Reasoning Models

Add code
Feb 04, 2026
Viaarxiv icon

BinaryPPO: Efficient Policy Optimization for Binary Classification

Add code
Feb 02, 2026
Viaarxiv icon

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

Add code
Jan 29, 2026
Viaarxiv icon

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Add code
Nov 13, 2025
Figure 1 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 2 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 3 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 4 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Viaarxiv icon

Taming Object Hallucinations with Verified Atomic Confidence Estimation

Add code
Nov 12, 2025
Figure 1 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 2 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 3 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 4 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Viaarxiv icon

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Add code
Oct 06, 2025
Figure 1 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 2 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 3 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 4 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Viaarxiv icon

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

Add code
Aug 06, 2025
Figure 1 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 2 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 3 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 4 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Viaarxiv icon

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

Add code
Jun 15, 2025
Viaarxiv icon

Improving Large Language Model Safety with Contrastive Representation Learning

Add code
Jun 13, 2025
Viaarxiv icon

Can Theoretical Physics Research Benefit from Language Agents?

Add code
Jun 06, 2025
Viaarxiv icon